Table of Contents

Problem 1

2

Problem 2

Done in Tex file.

Problem 3

2.

One simple example of a corruption process could be adding Gaussian noise with varying standard deviations. An equation for this is below:

\begin{equation}\label{eq:aenoise} C(\hat{x}\vert x) = x + \mathcal{N}(0,0.1) \end{equation}

This function is implemented below on $y = x^2$

3.

We can see that the AE learned some of the salient features of the dataset since we see a separation between the different digits, especially those that are very different from one another. For example, 0 and 1 are far apart, since they are very different. Same with 1 and 9, while 0 and 9 are closer since they are more similar. One of the issues we see here is that there isn't clear separation between the other points, so further optimization of the AE can be done. Furthermore, since we didn't use a VAE, we won't be able to get very reasonable examples if we sample from this latent space.

Problem 4

2: MNIST

We can see that the values are much clearer after 10 epochs than 1 epoch. The reconstruction loss decreased and we are more faithful to our original image after 10 epochs.

3: Fashion-MNIST

We performed a search over some of the hyperparameters of the VAE model, including learning rate, batch size, and number of epochs, while learning the network architecture constant. The best performing model has the attributes below:

Hyperparameter Value
epochs 30
batch-size 128
learning-rate 0.0005

Problem 5

GAN: 1-3

Analysis of GAN Hyperparameters

We modified a number of hyperparameters and training schemes for GAN training. We recorded the first and last epoch reconstruction values for all of these in the results/gan_results folder in the project directory. We will describe what we changed and how it impacted our results here.


We modified the the following 5 parameters: generator/discriminator learning rates, doubling training of either generator or discriminator, $\beta_1$ for the Adam optimizer. Our baseline is just increasing the epochs with the given learning rates and other parameters.

Starting with changing just the learning rates, we first tried to lower just the generator learning rate, as we saw that its loss was quite low, so we wanted to slow it down to improve the minimax game. We get:

It doesn't seem like there is much difference with lowering the learning rates. One thing we see is that there are fewer shoe shaped images, which might have to do with mode collapse, though I'm not sure how learning rate has to impact this. It is very possible that this is just random initialization. Next, reducing both the learning rates shows that we just needed lower learning rates. This gives us clearer generations.

Changing Training Scheme

We next played with changing the training scheme. When we trained the generator twice, we receive very weird values that are difficult to explain. The losses seems to look normal, but the gradients may have blown up/collapsed. We show both the first and last epoch. In this case, the first epoch seems like it could be focusing on the correct areas of the image for shirts, but then it falls apart by the last epoch.

However, changing the scheme to train the discriminator twice as often as the generator seemed to provide clearer pictures than changing the learning rates.

Our best performing model occured when we messed with the $\beta_1$ values in the Adam optimizer

Instead of the fuzzy short and long sleeve shirts, we see clear separation between short and llong sleeves. However, we still have mode collapse as we don't see any shoes at all. This is resolved in the CGAN

4-5: Conditional GAN

Analysis of CGAN Hyperparameters

Using the best parameters from the GAN and changing the GAN into a conditional GAN allowed us to fix the mode collapse issue as seem below

In this case, we can cleary see that the pants, shirts, and shoes are all relatively clear, as compared to our best performing GAN architectures. However, unlike the GAN, we don't end up with only tops. We also see handbags and shoes. This resolved the mode collapse issue by demonstrating a better relative abundance of each class. However, compared to the best VAE, we still have a much blurrier picture. There are clearer edge separations in the CGAN than the VAE, but the images themselves sometimes are completely obfuscated, like in the third to last picture on the bottom row of the sandal. In this case, we might prefer the VAE, though empirically, it seems that GANs perform better, so maybe this is an issue of not searching the parameter space well enough.